| Date | Time | Activity |
|---|---|---|
| Mon 03/11/2024 | 10:00 to 16:00 | Unix-like systems, bash, connecting to SCWales |
| Tue 04/11/2024 | 10:00 to 16:00 | More bash, SCWales, using SCWales and slurm |
| Wed 05/11/2024 | 10:00 to 16:00 | Illumina data, BEARCAVE, data processing |
| Thu 06/11/2024 | 10:00 to 16:00 | ANGSD, covariance and distance matrices, heterozygosity, intro to R |
| Fri 07/11/2024 | 10:00 to 16:00 | Maps, PCA’s, NJ trees, Manhattan plots and Rmarkdown |
https://drabarlow.github.io/bioinformatics_bootcamp/
https://drabarlow.github.io/bioinformatics_bootcamp/bootcamp_worksheet_2025.html
https://github.com/drabarlow/bioinformatics_bootcamp
bash and R.bash and R.sudo)Mac OS
Linux
R typically via Rstudio)bashDOS and Unix not yet possible| Windows | Mac | Linux | |
|---|---|---|---|
| standard PC functions | yes | yes | yes |
| cost | yes | yes | free |
| hardware choice | yes | no | yes |
| bioinformatics | no | yes | yes |
| HPC | no | no | yes |
| open source | no | no | yes |
| active community | no | no | yes |
| games | yes | no | no |
sh), developed by Steven Bourne in 1979bash)bash or something like it
ssh)scp or sftpslurm job schedulermodulesConnecting to the jump host (with MFA)
ssh you25usr@ssh.bangor.ac.uk
Note: most UNIX systems do not show anything when you’re typing your password!
If successful, connecting to Hawk
ssh b.you25usr@hawklogin.cf.ac.uk
Raise your hand if you are having issues đŸ™Œ
/ [root] is uppermost level of filesystem/working directory/home/b.xlb21brx/ /scratch/b.xlb21brx/
slurm| Platform | Million reads | Read length | Gb data | Genome coverage |
|---|---|---|---|---|
| iSeq | 4 | 2 x 150 bp | 1.2 | 0.4x |
| MiniSeq | 25 | 2 x 150 bp | 7.5 | 2.5x |
| MiSeq | 100 | 2 x 500 bp | 30 | 10x |
| Nextseq 550 | 400 | 2 x 150 bp | 120 | 40x |
| NextSeq 1000/2000 | 1800 | 2 x 300 bp | 540 | 180x |
| NovaSeq 6000 | 20000 | 2 x 250 bp | 3000 | 1000x |
| NovaSeq X | 52000 | 2 x 150 bp | 8000 | 2667x |
*Indexes allow multiple samples to be sequenced at the same time
[Not an exhaustive list]
Short reads from a single individual can be mapped to a reference genome assembly
sample|locality | adder01-04|Dublin adder05-08|Belfast adder09-12|Cork adder13-16|Limerick adder17-20|Galway adder21-24|Dundalk adder25-27|Bray adder28|outgroup
@A00551:758:HKTVJDSX7:4:1101:3595:6872 1:N:0:CCTGAGATGT+GGTCTAGTTG CTGAATATGGATTTTAATTGAATCCTAAGATATTATAGCATCTTTCACTCCCTGTCCTGTGCATGTCAGA + FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF
45 ka cave bear (Ursus kudarensis)
Cutadapt
FLASHExpected output in /BEARCAVE2/trimdata/*processing/
*_mappable.fastq.gz [big file]*_mappable_R1.fastq.gz [big file]*_mappable_R2.fastq.gz [big file]*_trim_report.log and merge report *_merge_report.logbwa mem algorithmsamtoolssamtoolsExpected output in /BEARCAVE2/mapped*/*processing/
*.bam [big file]*.bam.bai*_mapping.logplink, admixtools, etc)Allele1|Allele2|prob11|prob12|prob22 |||| A|T|0.05|0.9|0.05
NGSadmixPCangsdNGSrelaterealSFSCovariance matrix
Distance matrix
Heterozygosity
realSFSRRstudioR worksRstudioRtidyverseR markdowngit) and other development toolsRstudioTidyverseggplot2tibbletidyrreadrdplyrstringrpurrforcatsR from the hereticsMost people disagree (in some cases strongly)
R is really good!tidyverse is not the way to teach R to beginnersggplot2 code is restrictiveObjects
<-Functions
function()?functionVector
c()my_vector[]Matrix
my_matrix[row, column]Dataframe
$, which can then be indexed like vectorsList
$rworldmap and sfeigen()ape librarySee you next year :)